Asymptotic optimality of likelihood-based cross-validation.
نویسندگان
چکیده
Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.
منابع مشابه
Asymptotic optimality of full cross-validation for selecting linear regression models
For the problem of model selection, full cross-validation has been proposed as alternative criterion to the traditional cross-validation, particularly in cases where the latter one is not well deened. To justify the use of the new proposal we show that under some conditions, both criteria share the same asymptotic optimality property when selecting among linear regression models.
متن کاملar X iv : 0 80 4 . 11 89 v 1 [ m at h . ST ] 8 A pr 2 00 8 A leave - p - out based estimation of the proportion of null hypotheses
In the multiple testing context, a challenging problem is the estimation of the proportion π0 of true-null hypotheses. A large number of estimators of this quantity rely on identifiability assumptions that either appear to be violated on real data, or may be at least relaxed. Under independence, we propose an estimator b π0 based on density estimation using both histograms and cross-validation....
متن کاملAsymptotic Efficiencies of the MLE Based on Bivariate Record Values from Bivariate Normal Distribution
Abstract. Maximum likelihood (ML) estimation based on bivariate record data is considered as the general inference problem. Assume that the process of observing k records is repeated m times, independently. The asymptotic properties including consistency and asymptotic normality of the Maximum Likelihood (ML) estimates of parameters of the underlying distribution is then established, when m is ...
متن کاملOn the Minimax Optimality of Block Thresholded Wavelets Estimators for ?-Mixing Process
We propose a wavelet based regression function estimator for the estimation of the regression function for a sequence of ?-missing random variables with a common one-dimensional probability density function. Some asymptotic properties of the proposed estimator based on block thresholding are investigated. It is found that the estimators achieve optimal minimax convergence rates over large class...
متن کاملSlope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases
We investigate the optimality for model selection of the so-called slope heuristics, V -fold cross-validation and V -fold penalization in a heteroscedatic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistical applications in genetics and molecular biology
دوره 3 شماره
صفحات -
تاریخ انتشار 2004